Monitoring Extreme-scale Lustre Toolkit

نویسندگان

  • Michael J. Brim
  • Joshua K. Lothian
چکیده

We discuss the design and ongoing development of the Monitoring Extreme-scale Lustre Toolkit (MELT), a unified Lustre performance monitoring and analysis infrastructure that provides continuous, low-overhead summary information on the health and performance of Lustre, as well as on-demand, indepth problem diagnosis and root-cause analysis. The MELT infrastructure leverages a distributed overlay network to enable monitoring of center-wide Lustre filesystems where clients are located across many network domains. We preview interactive command-line utilities that help administrators and users to observe Lustre performance at various levels of resolution, from individual servers or clients to whole filesystems, including joblevel reporting. Finally, we discuss our future plans for automating the root-cause analysis of common Lustre performance problems. Keywords—Lustre; performance monitoring; overlay network; data aggregation

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

LIOProf: Exposing Lustre File System Behavior for I/O Middleware

As parallel I/O subsystem in large-scale supercomputers is becoming complex due to multiple levels of software libraries, hardware layers, and various I/O patterns, detecting performance bottlenecks is a critical requirement. While there exist a few tools to characterize application I/O, robust analysis of file system behavior and associating file-system feedback with application I/O patterns a...

متن کامل

A Toolkit for Storage QoS Provisioning for Data-Intensive Applications

This paper describes a programming toolkit developed in the PL-Grid project, named QStorMan, which supports storage QoS provisioning for data-intensive applications in distributed environments. QStorMan exploits knowledgeoriented methods for matching storage resources to non-functional requirements, which are defined for a data-intensive application. In order to support various usage scenarios,...

متن کامل

Resource-Bounded Runtime Verification of Java Programs with Real-Time Properties

Given the intractability of exhaustively verifying software, the use of runtime verification, to verify single execution paths at runtime, is becoming increasingly popular. Undoubtedly, the overhead introduced by runtime verification is a concern for system developers planning to introduce this technique in their work. By using Lustre to write security-critical properties, we exploit the langua...

متن کامل

Implementing a Hierarchical Storage Management system in a large-scale Lustre and HPSS environment

HSM functionality has been available with Lustre for several releases and is an important aspect for HPC systems to provide data protection, space savings, and cost efficiencies, and is especially important to the NCSA Blue Waters system. Very few operational HPC centers have deployed HSM with Lustre, and even fewer at the scale of Blue Waters. This paper will describe the goals for HSM in gene...

متن کامل

Translation, Adaptation and Validation of Referral Systems Assessment and Monitoring Toolkit for the Family Physicians Program in Iran

Background and purpose: Studies on the function of referral system in Iran had not covered all aspects and structures of the referral system. This could be due to lack of an appropriate tool that could investigate referral system in Iran. The current study was done to translate and investigate the validation of Referral Systems Assessment and Monitoring (RSAM) Toolkit based on family physician ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1504.06836  شماره 

صفحات  -

تاریخ انتشار 2015